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ABSTRACT 

We present a robust and fast algorithm for performing astrometry and source cross-identification 
on two dimensional point lists, such as between a catalogue and an astronomical image, or between 
two images. The method is based on minimal assumptions: the lists can be rotated, magnified 
and inverted with respect to each other in an arbitrary way. The algorithm is tailored to work 
efficiently on wide fields with large number of sources and significant non-linear distortions, as long 
as the distortions can be approximated with linear transformations locally, over the scale-length of 
the average distance between the points. The procedure is based on symmetric point matching in a 
newly defined continuous triangle space that consists of triangles generated by an extended Delaunay 
triangulation. Our software implementation performed at the 99.995% success rate on ~ 260,000 
frames taken by the HATNet project. 

Subject headings: methods: data analysis - astrometry - astronomical data bases: catalogs - surveys 



1. INTRODUCTION 

Cross-matching two two-dimensional points lists is a 
crucial step in astrometry and source identification. The 
tasks involves finding the appropriate geometrical trans- 
formation that transforms one list into the reference 
frame of the other, followed by finding the best match- 
ing point-pairs. One of the lists usually contains the 
pixel coordinates of sources in an astronomical image 
(e.g. point- like sources, such as stars), while the other 
list can be either a reference catalog with celestial coor- 
dinates, or it can also consist of pixel coordinates that 
originate from a different source of observation (another 
image). Throughout this paper we denote the reference 
(list) as TZ, the image (list) as T, and the function that 
transforms the reference to the image as Tr-^i. 

The difficulty of the problem is that in order to find 
matching pairs, one needs to know the transformation, 
and vica versa: to derive the transformation, one needs 
point-pairs. Furthermore, the lists may not fully overlap 
in space, and may have only a small fraction of sources 
in common. 

By making simple assumptions on the properties of 
J-r^i, however, the problem can be tackled. A very 
specific case is when there is only a simple translation 
between the lists, and one can use c ross-correlation tech- 
niques (see iPhillips k. Davis! 119951) to find the trans- 
formation. We note, t hat the a method proposed by 
iThiebaut & Boerl 1)2001(1 uses the whole image informa- 
tion to derive a transformation (translation and magni- 
fication). 

A more general assumption, typical to astronomical 
applications, is a that Tr^i is a similarity transforma- 
tion (rotation, magnification, inversion, without shear), 
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i.e. Tr^i = AAr + b, where A is a (non-zero) scalar 
A times the orthogonal matrix, b is an arbitrary transla- 
tion, and r is the spatial vector of points. Exploiting that 
geometrical patterns remain similar after the transforma- 
tion, more general algorithms have been developed that 
are b ased on pattern matching l|Grotbll986HValdes et alJ 
11995(1 . The idea is that the initial transformation is found 
by the aid of a specific set of patterns that are generated 
from a subset of the points on both TZ and X. For ex- 
ample, the subset can be that of the brightest sources, 
and the patterns can be triangles. With the knowledge 
of this initial transformation, more points can be cross- 
matched, and the transformation between the lists can 
be iteratively refined. Some of these m ethods are imple- 
ment ed as an iraf task 3 in immatch j Phill ips fc Davisl 
1995). 

The above pattern matching methods perform well 
as long as the dominant term in the transformation is 
linear, such as for astrometry of narrow field-of-view 
(FOV) images, and as long as the number of sources 
is small (because of the large number of patterns that 
can be generated - see later). In the past decade of 
astronomy, with the development of large format CCD 
cameras or mosaic imagers, many wide-field surveys 
appeared, such as those looking fo r transient events 
(e.g. ROTSE — lAkerlof et all I2000D. transitin g plan- 
e ts (e.g. Kelt -iPepper. Gould, fc Depovll200l TrE S - 
-lAlnnso et a]Jl20f)4 HATNet - IBakosI 120021 I2004 see 
Char bonneaul2 006 for further reference s) , or all-sky vari- 
ability (e.g. ASAS - lPoimaiisk]|1997|) . There are non- 
negligible, higher order distortion terms in the astromet- 
ric solution that are due to, for instance, the projection 
of celestial to pixel coordinates and the properties of the 

3 IRAF is distributed by the National Optical Astronomy Ob- 
servatories, which are operated by the Association of Universities 
for Research in Astronomy, Inc., under cooperative agreement with 
the National Science Foundation. 
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fast focal ratio optical systems. Furthermore, these im- 
ages may contain ~ 10 5 sources, and pattern matching 
is non-trivial. 

These surveys necessitated a further generalization of 
the algorithm, which we present in this paper. To be 
more specific, we were motivated by the astrometric re- 
quirements of the Hungarian-made Automated Telescope 
Network (HATNet). Each HAT telescope of the Network 
consists of a / = 200mm, d = //1.8 telephoto lens and 
a 2K x 2K CCD yielding an 8° x 8° FOV. In our expe- 
rience, we need at least 4th order polynomial functions 
of the pixel coordinates in order to properly describe the 
distortion of the lens. With a typical exposure time of 5 
minutes in I-band, in a moderately dense field (b ~ 15°) 
there are 30000 stars brighter than 1=13 for which better 
than 10% photometry can be achieved. If we consider all 
3-tr detections, we have to deal with the identification of 
-100,000 sources. 

The algorithm presented in this paper is based on, and 
is a generalization of the above pattern matching algo- 
rithms. It is very fast, and works robustly for wide-field 
imaging with minimal assumptions. Namely, we assume 
that: i) the distortions are non-negligible, but small com- 
pared to the linear term, ii) there exists a smooth trans- 
formation between the reference and image points, iii) 
the point lists have a considerable number of sources in 
common, and iv) the transformation is locally invertible. 
The paper is presented as follows. First we describe sym- 
metrical point matching in § before we go on to the 
discussion of finding the transformation (§ - The soft- 
ware implementation and its performance on a large and 
inhomogeneous dataset is demonstrated in § 0] Finally, 
we draw conclusions in § 

2. SYMMETRIC POINT MATCHING 

First, let us assume that Tr-^i is known. To find 
point-pairs between 1Z and I one should first transform 
the reference points to the reference frame of the image: 
1Z' = Tr^i(JZ). Now it is possible to perform a simple 
symmetric point matching between TV and X. One point 
(i?i S TV) from the first and one point (1\ £ X) from the 
second set are treated as a pair if the closest point to R\ 
is I\ and the closest point to I\ is R\. This requirement 
is symmetric by definition and excludes such cases when 
e.g. the closest point to R\ is I\, but there exists an i?2 
that is even closer to I\, etc. 

In one dimension, finding the point of a given list near- 
est to a specific point (x) can be implemented as a bi- 
nary search. Let us assume that the point list with N 
points is ordered in ascending order. This has to be done 
only once, at the beginning, and using the quicksort al- 
gorithm, for example, the required time scales on average 
as 0(N log N). Then x is compared to the median of the 
list: if it is less than the median, the search can be con- 
tinued recursively in the first N/2 points, if it is greater 
than the median, the second N/2 half is used. At the end 
only one comparison is needed to find out whether x is 
closer to its left or right neighbor, so in total 1 + log 2 (A) 
comparisons are needed, which is an O(logA) function 
of N. Thus, the total time including the initial sorting 
also goes as O(NlogN). 

As regards a two dimensional list, let us assume again, 
that the points are ordered in ascending order by their 
x coordinates (initial sorting — O(AlogA)), and they 



are spread uniformly in a square of unit area. Finding 
the nearest point in x coordinate also requires 0(log A) 
comparisons, however, the point found presumably will 
not be the nearest in Euclidean distance. The expecta- 
tion value of the distance between two points is 1/yN, 
and thus we have to compare points within a strip with 
this width and unity height, meaning 0(^/N) compar- 
isons. Therefore, the total time required by a symmetric 
point matching between two catalogs in two dimensions 
requires 0(N 3 / 2 log N) time. 

We note that finding the closest point within a given 
set of points is also known as n earest neighbor problem 
(for a summary see Gionis 2002, and references therein). 
It is possible to reduce the computation time in 2 di- 
mensions to O(NlogN) by the aid of Voronoi diagrams 
and Voronoi cells, but we have not implemented such an 
algorithm in our matching codes. 

3. FINDING THE TRANSFORMATION 

Let us go back to finding the transformation between 7Z 
and X. The first, and most crucial step of the algorithm 
is to find an initial "guess" FrL>i for the transformation 

based on a variant of triangle matching. Using FrI+j, TZ 
is transformed to X, symmetric point-matching is done, 
and the paired coordinates are used to further refine the 
transformation (leading to Fr^t in iteration z), and in- 
crease the number of matched points iteratively. A major 
part of this paper is devoted to finding the initial trans- 
formation. 

3.1. Triangle matching 

It w as proposed earlier by iGrothl (Il98fifl and iStetsonl 
(1989), and recently by others fsee iValdes et al.l ll995) 
to use triangle matching for the initial "guess" of the 
transformation. The total number of triangles that can 
be formed using N points is N(N — 1)(A — 2)/6, an 
0(N 3 ) function of N. As this can be an overwhelming 
number, one can resort to using a subset of the points 
for the vertices of the triangles to be generated. One can 
also limit the parameters of the triangles, such as exclude 
elongated or large (small) triangles. 

As triangles are uniquely defined by three parameters, 
for example the length of the three sides, these param- 
eters (or their appropriate combinations) naturally span 
a 3-dimensional triangle space. Because our assumption 
is that Tr^i is dominated by the linear term, to first 
order approximation there is a single scalar magnifica- 
tion between 1Z and X (besides the rotation, chirality 
and translation). It is possible to reduce the triangle 
space to a normalized, two-dimensional triangle space 
{{T Xl T y ) £ T), whereby the original size information is 
lost. Similar triangles (with or without taking into ac- 
count a possible flip) can be represented by the same 
points in this space, alleviating triangle matching be- 
tween 1Z and X. 

3.1.1. Triangle spaces 

There are multiple ways of deriving normalized trian- 
gle spaces. One can define a "mixed" normalized trian- 
gle space T<- mix \ where the coordinates are insensitive 
to inversion between the original coordinate lists, i.e. all 
similar triangles are represented by the same point irre- 
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Fig. 1. — The position of triangles in the mixed and the chiral 
triangle spaces. The exact position of a given triangle is represented 
by its center of gravity. Note that in the mixed triangle space 
some triangles with identical side ratios but different orientation 
overlap. The dashed line shows the boundaries of the triangle 
space. The dotted-dashed line represents the right triangles and 
separates obtuse and acute ones. 

spective of their chirality ijValdes et al.lll995j) : 

Tj mix) =p/a, (1) 
T^=q/a, (2) 

where a, p and q are the sides of the triangle in descend- 
ing order. Triangles in this space are shown on the left 
panel of Fig. ^ Coordinates in the mixed triangle space 
are continuous functions of the sides (and therefore of 
the spatial coordinates of the vertices of the original tri- 
angle) but the orientation information is lost. Because 
we assumed that J-r—,1 is smooth and bijective, no local 
inversions and flips can occur. In other words, 1Z and X 
are either flipped or not with respect to each other, but 
chirality does not have a spatial dependence, and there 
are no "local spots" that are mirrored. Therefore, using 
mixed triangle space coordinates can yield false triangle 
matchings that can lead to an inaccurate initial trans- 
formation, or the match may even fail. Thus, for large 
sets of points and triangles it is more reliable to fix the 
orientation of the transformation. For example, first as- 
sume the coordinates are not flipped, perform a triangle 
match, and if this match is unsatisfactory, then repeat 
the fit with flipped triangles. 

This leads to the definition of an alternative, "chiral" 
triangle space: 



Ti chir) =6/a, 
T^=c/a, 



(3) 
(4) 



where a, b and c are the sides in counter-clockwise or- 
der and a is the longest side. In this space similar trian- 
gles with different orientations have different coordinates. 
The shortcoming of 7 1 ( chlr ) is that it is not continuous: a 
small perturbation of an isosceles triangle can result in 
a new coordinate that is at the upper rightmost edge of 
the triangle space. 

In the following, we show that it is possible to define 
a parametrization that is both continuous and preserves 
chirality. Flip the chiral triangle space in the right panel 
of Fig. H along the T x +T y = 1 line. This transformation 
moves the equilateral triangle into the origin. Following 
this, apply radial magnification of the whole space to 
move the T x + T y = 1 line to the T 2 + T 2 = 1 arc (the 
magnification factor is not constant: 1 along the direction 
of x and y-axis and \[2 along the T x — T y line) . Finally, 
apply an azimuthal slew by a factor of 4 to identify the 
T y = 0, T x > and T x = 0, T y > edges of the space. To 
be more specific, let us denote the sides as in 

T (chir). 

a, 
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Fig. 2. — T riangle s in the the continuous triangle space as 
defined by Eqs. IT1HT21 We show the same triangles as earlier, 
in Fig. Q] for the T< mix ) and T( chir ) triangle spaces. Equilateral 
triangles are centered in the origin. The dotted-dashed line refers 
to the right triangles, and divides the space to acute (inside) and 
obtuse (outside) triangles. Isosceles triangles are placed on the 



x-axis (where T, 



(cont J 



0). 



b and c in counter-clockwise order where a is the longest, 
and define 



a = 1 

(3 = 1 



b/a, 
c/a. 



(5) 
(6) 



Using these values, it is easy to prove that by using the 
definitions of the following variables: 



a(a + (3) 



yi = - 



(3{a- 



v/« 2 + /3 2 ' 

2 2 
x% = x 1 -y 1 , 

y 2 = 2x 1 y 1 , 

one can define the triangle space coordinates as: 

(a + (3) (a 4 - 6a 2 /3 2 + (3 4 ) 



T-i(cont) 
V 



(a + 0)3 
(a + (3f 



4(a 



(a 2 + (3 2 ) 2 
- f3)af3(a 2 - f3 2 ) 



(a 2 + (3 2 ) 2 



(7) 

(8) 

(9) 
(10) 

,(H) 
(12) 



The above defined T( cont ) continuous triangle space has 
many advantages. It is a continuous function of the sides 
for all non-singular triangles, and also preserves chirality 
information. Furthermore, it spans a larger area, and 
misidentification of triangles (that may be very densely 
packed) is decreased. Some triangles in this space are 
shown in Fig. [21 

3.1.2. Optimal triangle sets 

As it was mentioned before, the total number of trian- 
gles that can be formed from N points is w iV 3 /6. Wide- 
field images typically contain 0(1O 4 ) points or more, and 
the total number of triangles that can be generated - a 
complete triangle list - is unpractical for the following 
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reasons. First, storing and handling such a large num- 
ber of triangles with typical computers is inconvenient. 
To give an example, a full triangulation of 10,000 points 
yields ~ 1.7 x 10 11 triangles. 

Second, this complete triangle list includes many tri- 
angles that are not optimal to use. For example large 
triangles can be significantly distorted in X with respect 
to 7Z, and thus are represented by substantially different 
coordinates in the triangle space. The size of optimal 
triangles is governed by two factors: the distortion of 
large triangles, and the uncertainty of triangle parame- 
ters for small triangles that are comparable in size to the 
astrometric errors of the vertices. 

To make an estimate of the optimal size for triangles, 
let us denote the characteristic size of the image by D, 
the astrometric error by S, and the size of a selected 
triangle as L. For the sake of simplicity, let us ignore 
the distortion effects of a complex optical assembly, and 
estimate the distortion factor fd in a wide field imager 
as the difference between the orthographic a nd gnomonic 
projections fsee iCalabretta fc G rcisen 20 0^1 : 



f d w |(sin(d) - tan(d))/d| « |1 - cos(d)| 



(13) 



where d is the radial distance as measured from the cen- 
ter of the field. For the HATNet frames (d = D w 6° to 
the corners) this estimate yields fd ~ 0.005. The distor- 
tion effects yield an error of f^L /D in the triangle space 
- the bigger the triangle, the more significant the distor- 
tion. For the same triangle, astrometric errors cause an 
uncertainty of S/L in the triangle space that decreases 
with increasing L. Making the two errors equal, 



fd-L 
D 



S_ 

T 



an optimal triangle size can be estimated by 



L 



<S- D 



opt 



fd 



(14) 



(15) 



In our case d = 2048 pixels (or 6°), fd = 0.005 and the 
centroid uncertainty for an I = 11 star is 5 = 0.01, so 
the optimal size of the triangles is L opt ~ 60 — 70 pixels. 

Third, dealing with many triangles may result in a tri- 
angle space that is over-saturated by the large number of 
points, and may yield unexpected matchings of triangles. 
In all definitions of the previous subsection, the area of 
the triangle space is approximately unity. Having trian- 
gles with an error of a in triangle space and assuming 
them to have a uniform distribution, allowing a 3ct spac- 
ing between them, and assuming a — 5/L opt , the number 
of triangles is delimited to: 



Trr 



1 



(Say 



D 

v7s 



(16) 



In our case (see values of D, fd and 5 above) the for- 
mer equation yields T opt ~ 2 x 10 6 triangles. Note that 
this is 5 orders of magnitude smaller than a complete 
triangulation (O(10 n )). 

3.1.3. The extended Delaunay triangulation 

Delaunay triangulation (see IShewchukl lT996) is a fast 
and robust way of generating a triangle mesh on a point- 
set. The Delaunay triangles are disjoint triangles where 





Fig. 3. — Triangulations of some randomly distributed points: 
the left panel shows the Delaunay triangulation (60 triangles in 
total) the right panel exhibits the i = 1 extended triangulation 
(312 triangles) of the same point set. 

the circumcircle of any triangle contains no other points 
from any other triangle. This is also equivalent to the 
most efficient exclusion of distorted triangles in a local 
triangulation. For a visual example of a Delaunay trian- 
gulation of a random set of points, see the left panel of 
Fig. El 

Following Euler's theorem (also known as the polyhe- 
dron formula) , one can calculate the number of triangles 
in a Delaunay triangulation of N points: 



T D = 2N - 2 - C, 



(17) 



where C is the number of edges on the convex hull of the 
point set. For large values of N, T& can be estimated 
as 27V, as 2 + C is negligible. Therefore, if we select a 
subset of points (from TZ or X) where neighboring ones 
have a distance of L opt , we get a Delaunay triangulation 
with approximately 2D 2 /L^ pt triangles. The D, 8 and fd 
values for HAT images correspond to w 6000 triangles, 
i.e. 3000 points. In our experience, this yields very fast 
matching, but it is not robust enough for general use, 
because of the following reasons. 

Delaunay triangulation is very sensitive for removing a 
point from the star list. According to the polyhedron for- 
mula, on the average, each point has 6 neighboring points 
and belongs to 6 triangles. Because of observational ef- 
fects or unexpected events, the number of points fluctu- 
ates in the list. To mention a few examples, it is custom- 
ary to build up X from the brightest stars in an image, 
but stars may get saturated or fall on bad columns, and 
thus disappear from the list. Star detection algorithms 
may find sources depending on the changing full-width 
at half maximum (FWHM) of the frames. Transients, 
variable stars or minor planets can lead to additional 
sources on occasions. In general, if one point is removed, 
6 Delaunay triangles are destroyed and 4 new ones are 
formed that are totally disjoint from the 6 original ones 
(and therefore they are represented by substantially dif- 
ferent points in the triangle space). Removing one third 
of the generating points might completely change the tri- 
angulation 4 . 

Second, and more important, there is no guarantee 
that the spatial density of points in TZ and X is similar. 
For example, the reference catalog is retrieved for stars 
with different magnitude limits than those found on the 
image. If the number of points in common in 1Z and X is 
only a small fraction of the total number of points, the 

4 Imagine a honey-bee cell structure where all central points 
of the hexagons are added or removed: these two construction 
generates disjoint Delaunay triangulations. 
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triangulation on the reference and image has no common 
triangles. Third, the number of the triangles with De- 
launay triangulation (Td) is definitely smaller than T opt ; 
i.e. the triangle space could support more triangles with- 
out much confusion. 

Therefore, it is beneficial to extend the Delaunay tri- 
angulation. A natural way of extension can be made as 
follows. Define a level £ and for any given point (P) se- 
lect all points from the point set of N points that can 
be connected to P via maximum £ edges of the Delau- 
nay triangulation. Following this, one can generate the 
full triangulation of this set and append the new trian- 
gles to the whole triangle set. This procedure can be 
repeated for all points in the point set at fixed I. For 
self-consistence, the £ = case is defined as the Delau- 
nay triangulation itself. If all points have 6 neighbors, 
the number of "extended" triangles per data point is: 

T e = {i£ 2 + U+ \){U 2 + 3£)(3£ 2 + 3£- l)/6 (18) 

for t > 0, i.e. this extension introduces 0(£ 6 ) new trian- 
gles. Because some of the extended triangles are repeti- 
tions of other triangles from the original Delaunay trian- 
gulation and from the extensions of another points, the 
final dependence only goes as 0{T^>£ 2 ). We note that our 
software implementation is slightly different, and the ex- 
pansion requires 0(N£ 2 ) time and automatically results 
in a triangle set where each triangle is unique. To give an 
example, for N — 10, 000 points the Delaunay triangula- 
tion gives 20, 000 triangles, the £ = 1 extended triangu- 
lation gives ~ 115,000 triangles, £ = 2 some ~ 347,000 
triangles, £ = 3 875, 000 and £ = 4 - 1, 841, 000 triangles, 
respectively. The extended triangulation is not only ad- 
vantageous because of more triangles, and better chance 
for matching, but also, there is a bigger variety in size 
that enhances matching if the input and reference lists 
have different spatial density. 

3.1.4. Matching the triangles in triangle space 

If the triangle sets for both the reference and the in- 
put list are known, the triangles can be matched in the 
normalized triangle space (where they are represented 
by two dimensional points) using the symmetric point 
matching as described in § [21 

In the next step we create a Nr x Nj "vote" matrix 
V , where Nr and Nj are the number of points in the 
reference and input lists that were used to generate the 
triangulations, respectively. The elements of this matrix 
have an initial value of 0. Each matched triangle cor- 
responds to 3 points in the reference list (identified by 
r i, r 2, r^) and 3 points in the input list (ii, %2 and is). 
Knowing these indices, the matrix elements V ri i 1 , V r2 i 2 
and V r3 i 3 are incremented. The magnitude of this in- 
crement (the vote) can depend on the distances of the 
matching triangles in the triangle space: the closer they 
are, the higher votes these points get. In our implemen- 
tation, if Nt triangles are matched in total, the closest 
pair gets Nt votes, the second closest pair gets Nt — 1 
votes, and so on. 

Having built up the vote matrix, we select the great- 
est elements of this matrix, and the appropriate points 
referring to these row and column indices are considered 
as matched sources. We note that not all of the posi- 
tive matrix elements are selected, because elements with 
smaller votes are likely to be due to misidentifications. 



We found that in practice the upper 40% of the matrix 
elements yield a robust match. 

3.2. The unitarity of the transformations 

If an initial set of the possible point-pairs are known 
from triangle-matching, one can fit a smooth function 
(e.g. a polynomial) that transforms the reference set to 
the input points. Our assumption was that the dominant 
term in our transformation is the similarity transforma- 
tion, which implies that the homogeneous linear part of it 
should be almost almost a unitarity operator. 5 After the 
transformation is determined, it is useful to measure how 
much we diverge from this assumption. As mentioned 
earlier (§0, similarity transformations can be written 
as 



XAr 



(19) 



where A ^ 0, and the a, 6, c, d matrix components are the 
sine and cosine of a given rotational angle, i.e. a = c and 
b = -c. 

If we separate the homogeneous linear part of the 
transformation, as described by a matrix similar to that 
in Eq. 1191 it will be a combination of rotation and dila- 
tion with possible inversion if \a\ « \d\ and |c| » |£>|. We 
can define the unitarity of a matrix as: 



A := 



(aTd) 2 + (b±c) 2 
a 2 + b 2 + c 2 + d 2 



(20) 



where the ± indicates the definition for regular and in- 
verting transformations, respectively. For a combination 
of rotation and dilation, A is zero, for a distorted trans- 
formation A rj fd <C 1 • 

The A unitarity gives a good measure of how well the 
initial transformation was determined. It happens occa- 
sionally that the transformation is erroneous, and in our 
experience, in these cases A is not just larger than the 
expectational value of fd, but it is w 1. This enables 
fine-tuning of the algorithm, such as changing chirality 
of the triangle space, or adding further iterations till sat- 
isfactory A is reached. 

3.3. Point matching in practice 

In practice, matching points between the 1Z reference 
and X image goes as the following: 

1. Generate two triangle sets Tr and Tj on 1Z and X, 
respectively: 

(a) In the first iteration, generate only Delaunay 
triangles. 

(b) Later, if necessary, extended triangulation can 
be generated with increasing levels of £. 

2. Match these two triangle sets in the triangle space 
using symmetric point matching. 

3. Select some possible point-pairs using a vote- 
algorithm (yielding No pairs). 

4. Derive the initial smooth transformation Fr1_,j us- 
ing a least-squares fit. 

5 Here AA + = /, where A + is the adjoint of A and I is the iden- 
tity, i.e. A is an orthogonal transformation with possible inversion 
and magnification. 
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(a) Check the unitarity of J- R ^j. 

(b) If it is greater than a given threshold (O(fd)), 
increase £ and go to step 1/b. If the unitarity 
is less than this threshold, proceed to step 5. 

(c) If we reached the maximal allowed £, try the 
procedure with triangles that are flipped with 
respect to each other between the image and 
reference, i.e. switch chirality of the T( cont ) 
triangle space. 

5. Transform 1Z using this initial transformation to 
the reference frame of the image (1Z' = T^^jlJZ)). 

6. Perform a symmetric point matching between 1Z' 
and X (yielding N\ > N pairs). 

7. Refine the transformation based on the greater 

number of pairs, yielding transformation T^ > _^ 1 , 
where i is the iteration number. 

8. If necessary, repeat points 5, 6 and 7 iteratively, 
increase the number of matched points, and refine 
the transformation. 

For most astrometric transformations and distortions 
it holds that locally they can be approximated with a 
similarity transformation. At a reasonable density of 
points on X and 2", the triangles generated by a (pos- 
sibly extended) Delaunay triangulation are small enough 
not to be affected by the distortions. The crucial step is 
the initial triangle matching, and due to the use of lo- 
cal triangles, it proves to be robust procedure. It should 

(i) 

be emphasized that J~r_^j can be any smooth transfor- 
mation, for example an afflne transformation with small 
shear, or polynomial transformation of any reasonable 
order. The optimal value of the order depends on the 
magnitude of the distortion. The detailed description of 
fitting such models and functions c an be found in var ious 
textbooks (see e.g. Chapter 15. in lPress et alJll99^L It 
is noteworthy that in step 7 one can perform a weighted 
fit with possible iterative rejection of n-cr outlier points. 

4. SOFTWARE IMPLEMENTATION AND APPLICATIONS 
4.1. Software implementation 

The coordinate matching and coordinate transform- 
ing algorithms are implemented in two stand-alone bi- 
nary programs written in ANSI C. The program named 
grmatch matches point sets, including triangle space gen- 
eration, triangle matching, symmetric point matching 
and polynomial fitting, that is steps 1 through 4 in § 13.31 
The other program, grtrans, transforms coordinate lists 
using the transformation coefficients that are output by 
grmatch. The grtrans code is also capable of fitting 
a general polynomial transformation between point-pair 
lists if they are paired or matched manually or by an 
external software. We should note that in the case of 
degeneracy, e.g. when all points are on a perfect lattice, 
the match will fail. 

Both programs are part of the FiHAT/HATpipe package 
that is under development for the massive data reduction 
of the HATNet data-flow. They can be easily embed- 
ded into UNIX environments, as both of them parses 
wide-range of command line arguments for defining the 



structure of the input data and fine-tuning the algorithm. 
The programs are also capable of redirecting their input 
and/or output to standard streams. 

By combining grmatch and grtrans, one can easily 
derive the World Coordinate System (WCS) informa- 
tion for a FITS data file. Output of WCS keywords is 
now fully implemented in grtrans, follo wing the con - 
ventions of the package wcstools 6 fsee iMiiikl 12002'). 
Such information is very useful for manual an alysis with 
well-k nown FITS viewers (e.g. ds9, see Dove fc Mandel 
2003). For a more detaile d description of WCS see 
Cala bretta fc Gre iggn[_l[2 002l) a nd on the representation 
of distortions see lShupg l)2005l) . 

The package containing the programs grmatch 
and grtrans and other related software are ac- 
cessible after registration from the web address 
http : //www .hatnet . hu/sof tware. 

4.2. Performance on large data sets 

We used grmatch and grtrans to perform astrometry 
and star identification on a larg e set of ima ges taken by 
the HAT Network of telescopes l)Bakosl200|) . The results 
presented in this paper are based on observations orig- 
inating from the following HATNet telescopes: HAT-5, 
HAT-6 and HAT- 7 located at the Fred Lawrence Whipple 
Observatory (FLWO), Arizona 7 , plus HAT-8 and HAT- 
9 at the Smithsonian Submillimeter Array roof (SMA) 
atop Mauna Kea, Hawaii. To recall, these telescopes 
have an identical setup: / = 200mm, d = f/1.8 tele- 
photo lens and a 2K x 2K CCD yielding an 8° x 8° FOV. 
In order to test the method on different instruments, we 
also performed astrometry on data taken by the TopHAT 
(FLWO) photometry follow-up instrument. TopHAT is a 
0.26m diameter, f/5 Ritchey-Cretien design with a Baker 
wide-field corrector, aided by a 2K x 2K Marconi chip, 
yielding 1.3° FOV. 

The steps of the astrometry and identification were 
the following. First, for all observed fields, reference 
star lists were g enerated using the 2MASS catalog (see 
iSkrutskiel l2006|) as reference. These reference lists in- 
clude the source identifiers, the original celestial coor- 
dinates (RA, Dec), an estimated I-band magnitude and 
the (£,77) projecte d coordinates of the stars. We used 
arc projection fsee iCalabretta fc Greisenll2002|) centered 
at the nominal center of the given field, and scaling of 
the projection was unity in the manner that a star lo- 
cated at 1 degree distance from the center of the given 
celestial field has a unit distance in the (£, 77) plane from 
the origin in the reference list. The FOV of the reference 
lists were a bit wider than the nominal FOV of the HAT 
telescopes so as to ensure a full overlap between the two 
lists in spite of the small uncertainties in the positioning 
of the telescopes. 

Second, an input star list was generated for each image, 
using our star detection algorithm f istar (also part of 
FiHAT/HATpipe) that detects and fits star-like objects 
above a given S/N threshold. This detection yields a set 
of input lists that include the pixel coordinates, (X, Y) of 
the stars and other quantities (including the flux, FWHM 
and the shape parameters). 

6 http:/ /tdc-www. harvard.edu/wcstools/ 

7 HAT-10 is also located at FLWO, but its data was not used in 
this paper. 
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Fig. 4. — Vector plots of the difference between the transformed 
reference and the input star coordinates for a typical HAT field. 
The left panel shows the difference for second-order, the right panel 
for fourth-order polynomial fits. 

Third, for each image, the input star list and the 
relevant reference star list was matched using the pro- 
gram grmatch. The match was performed between the 
projected reference coordinates, (£,77), and the detected 
pixel coordinates, (X, Y). The program outputs two files: 
the list of the matched lines (this is the "match" file) and 
a small file that includes the fitted polynomial transfor- 
mation parameters and some statistical data (this is the 
"transformation" file). It should be emphasized that the 
match was not done directly using the original celestial 
coordinates, as they exhibit an unwanted curvature in 
the field. 

Finally, the reference star list (£, rj) was transformed 
by the program grtrans into the system of the image 
(X, Y) using the "transformation" file. The transformed 
list shows where each star with a given identifier would 
fall on the image. The transformation can be also used 
to calculate the WCS information for the given image. 

We note that the crucial part of the process is the third 
step. This can be fine-tuned by many parameters, one of 
the most important being the polynomial order. For a 
small FOV (less than one degree) and small distortions, 
linear or second order polynomials yield good result. For 
HAT images, we had to increase the order up to 6 to 
achieve the best results. Fig. 0] exhibits two vector plots 
that show the difference between the transformed refer- 
ence coordinates and the detected star coordinates using 
a second and a fourth-order polynomial transformation 
for a typical HAT image. In the first case, using a second- 
order fit, definite radial structures remain, and the stars 
located at the corners of the image are not even matched 
due to the large distortions in the optics. Using a fourth- 
order fit, however, all segments of the image are matched 
and the residuals are also smaller. These small residuals 
can be better visualized if only the difference between 
one of the coordinates is shown in a gradient plot: Fig. El 
illustrates the difference between the Y coordinates for 
the same image using a fourth- and sixth-order polyno- 
mial fit. While there is a definite residual structure in 
the fourth-order fit, it disappears using the sixth-order 
polynomial transformation. 

As regards statistics, we performed the astrometry and 
source identification for 243,447 HAT images that had 
been acquired between the beginning of 2003 and June 
2006. The wide-field telescopes observed 52 individual 
and almost non-overlapping fields between b = —30° and 
b = +74° galactic latitudes. 

We initiated the processing with the following param- 
eters. For the triangulation, the 3000 brightest sources 
were used from both the reference catalog and the de- 
tected stars. The critical unitarity was set to 0.01, there- 
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Fig. 5. — The difference between the Y coordinates of the 
transformed reference and the input star coordinates for a typical 
HAT field. The left panel shows the difference for fourth-order, the 
right panel for sixth-order polynomial fits. 

fore if the fitted initial transformation had an unitar- 
ity larger than this value, the level of the triangula- 
tion expansion was increased. The final transformation 
was determined using a weighted sixth-order polynomial 
fit. Because the astrometric errors of brighter stars are 
smaller, we weighted data-points based on their mag- 
nitude during the fit. Finally, the maximal distance of 
matches was set to one pixel to reject false identifications. 

Astrometry and cross-identification of sources was suc- 
cessful for 238,353 images. The remaining 5094 images 
were analyzed manually, and we found that only 13 of 
them were good enough to expect astrometry to suc- 
ceed; all the other ones were cloudy or had shown var- 
ious other errors. Astrometry on these 13 images also 
succeeded by decreasing the number of stars for triangu- 
lation to 2000. It means that a completely automatic run 
yielded 99.995% success rate, and the other images were 
also matched by applying small changes to the fine-tune 
parameters. 

In order to test the algorithm with a different instru- 
ment, we also performed astrometry on 22,936 TopHAT 
images taken in 2005. The only difference in the proce- 
dure was that the polynomial transformation was only of 
2nd order. The success ratio was 93%, but 90% of the 
frames where astrometry failed were cloudy with virtu- 
ally no stars. Astrometry also failed on very short expo- 
sure (lOsec) V-band frames. Fine-tuning the parameters 
(number of triangles, input lists) resolved most of these 
cases. 

The following statistics was done on the wide-field 
HAT frames. The median value of the number of 
matched sources relative to the number of stars in the 
reference or the input list was 98.38% ± 0.31% (median 
deviation). The average value of the CPU usage was 
0.77 ± 0.22 seconds per frame on a 64-bit AMD Opteron 
machine running at 2 GHz. Astrometry was successful 
on 96.78% of the images using Delaunay-triangulation 
without extended triangles (CPU: 0.73sec), while 0.49% 
of the frames were processed at level £ = 1 extended tri- 
angulation (CPU: 1.79 sec), 0.06% at I = 2 (CPU: 2.22 
sec), 33 images at £ = 3 (CPU: 3.67 sec) and 1334+5094 
images at I = 4 extended triangulation (CPU: 5.20 sec). 
Here 5094 refers to those images were astrometry failed 
even at I = 4, mostly because of bad data quality (see be- 
fore). The reason for the Delaunay triangulation being 
successful for 96% of the wide-field HAT frames with- 
out extended triangulation is because the HAT instru- 
ments perform homogeneous data acquisition, and are 
very well characterized (zero-points, saturation). Thus, 
the 2MASS reference catalogs can be retrieved for the 
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given field in such a way that there are many sources 
in common. In general applications, however, when the 
saturation limit and faint magnitude limits of an image 
have only a crude estimate, the extended triangulation 
is essential. 

Although the procedure is fast, we note that the most 
time consuming part of the process is the triangulation 
generation and the triangle matching itself. On average, 
this required more than 60% of the total time, and at t = 
4, 92% of the time. The median value of the fit residuals 
was 0.06 pixels, while the median value of the unitarities 
was 0.0042. The latter is in a quite good agreement with 
the expected value of the nonlincarity factor fd sa 0.005. 

4.3. Comparison with other implementations 

We also compared the performance of the program 
grmatch with an existing implementation within IRAF, 
namely the images . immatch package with its relevant 
tasks xyxymatch, geomap and geoxytran. The steps of 
the point matching were as described in § 13.31 First, 
an initial set of possible pairs were established us- 
ing xyxymatch and the "triangles" option as match- 
ing method. Because the triangle sets generated by 
xyxymatch are full triangulations, we limited our input 
lists to the brightest sources, otherwise the 0(N 3 ) depen- 
dence of the number of the triangles would have resulted 
in an unrealistically long matching time. Second, the 
initial transformation was fitted using geomap, and fol- 
lowed by transforming the reference catalog to the frame 
of the input list using geoxytran and this fit. Third, the 
transformed reference and the original input list were also 
matched by xyxymatch, but this time using the "toler- 
ance" matching method. Finally, this new list of point 
pairs were used again to refine the geometrical transfor- 
mation by geomap. 

The comparison of grmatch and the IRAF 
images . immatch implementation was based on 950 
individual images, all acquired by TopHAT from the 
same FOV. We note that we had to use the relatively 
narrow field TopHAT for the comparison, as the triangle 
match on the original 8.2° HATNet frames is almost 
hopeless given the spatial distortions, the large number 
of stars, and the difficulty to select the brightest stars 
and at the same time retain a small total number of 
selected sources (in order to be able to cope with a full 
triangulation). On each image there were approximately 
800 — 900 detected stars, depending on the airmass 
or thin clouds. For the triangulation and the initial 
xyxymatch fit we used the 35 brightest sources both 
from the reference catalog and from the input star lists. 
We found that grmatch required ~ 0.1 sec CPU time on 
average while the whole procedure using these IRAF- 
based tasks, as described above, required ^5 — 7 sec net 
CPU time for a single image. Both algorithms yielded 
the same transformation coefficients and found the 
same number of pairs, however, in 3 cases, the number 
of sources used for triangulation had to be increased 
manually to 40 or 45. It is noteworthy that although 
the IRAF version proved to be significantly slower, the 
time consuming part was the first xyxymatch matching. 
All other tasks, including the second matching (with 
"tolerance" option) required only a fraction of a second 
per image. 



5. SUMMARY 

In this paper we present a robust algorithm for cross- 
matching two two-dimensional point lists. The task is 
twofold: finding the smooth spatial transformation be- 
tween the lists, and cross-matching points. These two 
steps are intertwined, and are performed in an iterative 
way till satisfactory transformation and matching rate 
are reached. We make only very basic assumptions that 
hold for almost all astronomical applications, including 
wide-field surveys with distorted fields and large num- 
ber of sources. Namely, the transformation between the 
point lists is dominantly a similarity transformation (ar- 
bitrary shift, rotation, magnification, inversion). A sig- 
nificant distortion term can be present, given it can be 
linearized on the scale-length of the average distance of 
neighboring points. 

In § [5] we briefly described symmetric point-matching 
in one and two dimensions, because this tool is used 
throughout the astrometry procedure. Finding the ini- 
tial transformation between the point lists is based on 
triangle matching. First we defined various normalized 
tri angle spaces i n § 13.1.11 The "mixed" triangle space 
of iValdes etaP <|l995) is a continuous function of the 
triangle parameters, but flipped triangles are not distin- 
guished. The "chiral" triangle space ensures that chiral- 
ity information is preserved, but this space is not contin- 
uous. We showed that it is possible to define a "continu- 
ous" triangle space that is both continuous and preserves 
chirality, and which, furthermore, spans a larger volume 
and diminishes confusion of triangles having similar co- 
ordinates. 

Taking into account the distortion of a field and the as- 
trometric errors, we calculated both the optimal size and 
number of triangles. For the typical setup of a HATNet 
telescope (8° x 8° FOV, distortion factor f d ~ 0.005) the 
optimal size is 0.2°, and the optimal number of triangles 
is less than 2 x 10 6 . We use Delaunay triangulation for 
generating the triangles of the triangle-space. This has 
the advantage of being fast, robust, and generating lo- 
cal triangles that are less prone to being distorted. In 
§ 13.1.31 we noted, however, that Delaunay triangulation 
is sensitive for removing or adding points to the list, and 
thus instable. We introduced an extension of this trian- 
gulation that is parametrized by an £ level. 

Having determined the transformation between the 
two lists, one can check how well our initial assumption 
of the dominant term being linear holds. In § 13.21 we 
introduced the unitarity of the transformation, a sim- 
ple scalar measure of this property. We described the 
practical details of the algorithm in § 13.31 and the actual 
software implementation (grmatch, grtrans) in § 14.11 

Finally, we ran the above programs on some 240,000 
frames taken by the wide-angle cameras of HATNet, plus 
20,000 frames acquired by the TopHAT telescope. The 
success rate was very close to 100%, and the routines 
handled the various pointing errors, defocusing and 6th 
order distortions in the wide fields. Both programs will 
become available in binary format for a wide range of 
architectures upon request from the authors. 

A. P. would like to thank the hospitality of the 
Harvard-Smithsonian Center for Astrophysics, where 
this work has been partially carried out. A. P. was 



Astrometry in Wide-Field Surveys 



9 



also supported by the Hungarian OTKA grant T- 
038437. The HATNet project is funded by NASA grant 
NNG04GN74G. G. A. B. wishes to acknowledge fund- 
ing by the NASA Hubble Fellowship grant HST-HF- 



01170. 01-A. Both authors would like to thank Istvan 
Domsa for the early development of triangle and point- 
matching codes. 



REFERENCES 



Akcrlof, C, et al.2000, AJ, 119, 1901 
Alonso, R., et al.2004, ApJ, 613, L153 

Bakos, G. A., Lazar, J., Papp, I., Sari, P., Green, E. M. 2002, PASP, 
114, 974 

Bakos, G. A., Noyes, R. W., Kovacs, G., Stanek, K. Z., Sasselov, 

D. D. and Domsa, I. 2004, PASP, 116, 266 
Calabretta, M. R., Greisen, E. W. 2002, A&A, 395, 1077 
Charbonneau, D., Brown, T. M., Burrows, A., Laughlin, G. 2006, 

in Conference Proceedings of Protostars and Planets V, 
|astro-ph/0603376| 
Gionis, A.: Computational Geometry: Nearest Neighbor Problem 

(lecture notes) 8 
Groth, E. J. 1986, AJ, 91, 1244 

Hartman, J. D., Bakos, G. A., Stanek, K. Z., Noyes, R. W. 2004, 
AJ, 128, 1761 

Joye, W. A., Mandel, E. 2003, ASP Conf. Ser., 295, 489 
Mink, D. J., 2002, ASP Conf. Ser, 281, 169 

Pepper, J., Gould, A., & Depoy, D. L.2004, AIP Conf. Proc. 713: 
The Search for Other Worlds, 713, 185 

Phillips, A. C, Davis, L. E. 1995, in Astronomical Data Analysis 
Software and Systems IV, ASP Conference Series Vol. 77, 297 
(eds. R. A. Shaw, H. E. Payne and J. J. E. Hayes) 

Pojmanski, G. 1997, Acta Astronomica, 47, 467 



Press, W. H., Teukolsky, S. A., Vetterling, W.T., Flannery, B.P. 

1992, Numerical Recipes in C: the art of scientific computing, 

Second Edition, Cambridge University Press 
Shewchuk, R. J. 1996, in Applied Computational Geometry: 

Towards Geometric Engineering, ed. M. C. Lin & D. Manocha 

(Berlin: Springer), 1148, 203 
Shupe, D. L., Moshir, Mehdrdad, Li, J.; Makovoz, D., Narron, R., 

Hook, R. N., 2005, ASP Conf. Ser., 347, 491 9 
Skrutskie, M. F., Cutri, R. M., Stiening, R., Weinberg, M. D., 

Schneider, S., Carpenter, J. M., Beichman, C, Capps, R., 

Chester, T., Elias, J., Huchra, J., Liebert, J., Lonsdale, C, 

Monet, D. C, Price, S., Seitzer, P., Jarrett, T., Kirkpatrick, J. 

D., Gizis, J., Howard, E., Evans, T., Fowler, J., Fullmer, L., 

Hurt, R., Light, R., Kopan, E. L., Marsh, K. A.,, McCallon, H. 

L., Tam, R., Van Dyk, S., and Wheelock, S., AJ, 131, 1163 
Stetson, P. B. 1989, in Advanced School of Astrophysics, Image 

and Data Processing/Interstellar Dust, ed. B. Barbury, E. Janot- 

Pacheco, A. M. Magalhaes and S. M. Viegas (Sao Paulo, Instituto 

Astronomico e Geofi'sico) 
Valdes, F. G., Campusano, L. E., Velasquez, J. D., Stetson, P. B. 

1995, PASP, 107, 1119 
Thiebaut, C, Boer, M. 2001, ASP Conf. Ser., 238, 388 



8 http://theory.stanford.edu/~nmishra/CS361-2002/lecturel2- 
scribe.pdf 

9 see also |http://spider.ipac.caltech.e du/staff/shupe/distortion _*vl.Q.htm| 



